16 - Artificial Intelligence II [ID:57518]
50 von 663 angezeigt

So, we were talking about another machine learning problem, which is linear regression

and classification.

So, completely different.

The same setup, slightly different application.

The idea is that we've learned with PAC learning last time that it may be a good idea to have

small learnable hypothesis spaces.

We're taking this to the radical conclusion, basically saying, oh, if we have a time series,

that does something, we're going to just assume that we can find a trend and have a linear

function that approximates this best.

How are we going to do that?

Now, kind of what we've done yesterday was the first training exercise.

Let's do it for a simple case.

Univariate linear regression.

It becomes a little bit more interesting if we actually have higher dimensions.

Multivariate regression.

Namely we have multiple linear functions from vectors to maybe vectors, maybe a single real

value.

And then this becomes a bit more useful.

But in the easy case, things are very simple.

And the poster child is you have some kind of XY thing, like house prices versus size

of the house in somewhere.

We have some data points and we find a trend in this.

How do we compute this?

The idea is very, very simple.

What we do is loss minimisation, just as we've prepared the whole thing.

And then in some cases, if the loss function is differentiable, we can just basically do

whatever we did in high school, make all the partial derivatives zero.

And in some cases, i.e. in the linear case, for quadratic distance loss, we can even have

closed form solutions.

How nice.

If we have more interesting loss functions, we can still do things than we do gradient

descent.

And the nice thing about this particular case is that we very often have convex weight spaces.

And in convex spaces, like a distorted sphere, we don't have local maxima or minima.

And so we're in a good case, which actually means hill climbing, that's what the search

algorithm behind gradient descent really is, is guaranteed to find the global optimum.

We don't have to do things like simulated annealing, where we do random restarts with

funny temperature optimisation and settlement things.

But we can just basically start anywhere, follow the highest, or follow the steepest

descent, just like a ball would, and find the global minima.

So that's, in a way, is the good case.

And we can actually use this gradient descent update.

That we can compute via the partial derivatives of the loss function, and just search in a

local search algorithm for it.

And there are various, and you can imagine that this is linear regression, it's something

you use all the time, and there's been a lot of engineering.

You can kind of do it one step at a time, or with multiple searchers, which would be

something like beam search, or with multiple steps at a time, at different step sizes,

and do the step sizes decay, and so on.

All of those things are practically important.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:26:57 Min

Aufnahmedatum

2025-06-25

Hochgeladen am

2025-06-27 01:59:06

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen